AITopics | condensed dataset

FairDD: Fair Dataset Distillation

Neural Information Processing SystemsJun-23-2026, 00:02:51 GMT

Condensing large datasets into smaller synthetic counterparts has demonstrated its promise for image classification. However, previous research has overlooked a crucial concern in image recognition: ensuring that models trained on condensed datasets are unbiased towards protected attributes (PA), such as gender and race. Our investigation reveals that dataset distillation fails to alleviate the unfairness towards minority groups within original datasets.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

Neural Information Processing SystemsJun-16-2026, 14:30:03 GMT

Dataset condensation aims to synthesize compact yet informative datasets that1 retain the training efficacy of full-scale data, offering substantial gains in efficiency.2 Recent studies reveal that the condensation process can be vulnerable to backdoor3 attacks, where malicious triggers are injected into the condensation dataset, manipu-4 lating model behavior during inference. While prior approaches have made progress5 in balancing attack success rate and clean test accuracy, they often fall short in6 preserving stealthiness, especially in concealing the visual artifacts of condensed7 data or the perturbations introduced during inference. To address this challenge,8 we introduce SNEAKDOOR, which enhances stealthiness without compromising9 attack effectiveness. SNEAKDOOR exploits the inherent vulnerability of class deci-10 sion boundaries and incorporates a generative module that constructs input-aware11 triggers aligned with local feature geometry, thereby minimizing detectability. This12 joint design enables the attack to remain imperceptible to both human inspection13 and statistical detection. Extensive experiments across multiple datasets demon-14 strate that SNEAKDOOR achieves a compelling balance among attack success rate,15 clean test accuracy, and stealthiness, substantially improving the invisibility of both16 the synthetic data and triggered samples while maintaining high attack efficacy.17

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

DC-BENCH: Dataset Condensation Benchmark

Neural Information Processing SystemsApr-24-2026, 09:16:34 GMT

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue. The quality of condensed dataset are often shadowed by many critical contributing factors to the end performance, such as data augmentation and model architectures. The lack of a systematic way to evaluate and compare condensation methods not only hinders our understanding of existing techniques, but also discourages practical usage of the synthesized datasets. This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. Leveraging this benchmark, we conduct a large-scale study of current condensation methods, and report many insightful findings that open up new possibilities for future development. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced1 to facilitate future research and application.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology: